Introduction

It’s a common use case when working with geo-spatial data to augment one spatial dataset with information from another spatial dataset – this can be done via spatial joins. In this post for example, I will show how to augment a dataset containing school locations with socioeconomic data of their surrounding statistical region. This approach has the drawback that the surrounding statistical region doesn’t reflect the actual catchment area of the school. I will present an approach based on intersection with Voronoi regions which should more accurately represent the socioeconomic characteristics of the population around a school.

Data

For this example, I’d like to compare child poverty (more accurately the percentage of children whose parents obtain social welfare) in the neighborhood regions around public and private primary schools in Berlin. This blog post concentrates on how to join the point samples (the schools) with the surrounding statistical regions, so I will present only a few summary statistics in the end since proper spatial modeling is beyond the scope of this blog post.

We will work with three datasets: The first spatial dataset contains the shape of the statistical regions in Berlin, the second dataset contains the socioeconomic data for these regions and the third dataset contains the locations and other attributes of primary schools in Berlin.

All data and the code are available in the GitHub repository. We will use the sf package for working with spatial data in R, dplyr for data management and ggplot2 for a few more advanced visualizations, i.e. when base plot() is not sufficient:

library(sf)
library(dplyr)
library(ggplot2)

Socioeconomic data for statistical regions

We will at first load a dataset with the most granular statistical regions for Berlin, called Planungsräume (planning areas). We select the area ID and name as spatial attributes. The result is a spatial dataframe (a simple feature (sf) collection).

bln_plan <- read_sf('data/berlin_plr.shp') %>%
  mutate(areaid = as.integer(SCHLUESSEL)) %>%   # transform character SCHLUESSEL to numeric area ID
  select(areaid, name = PLR_NAME)
bln_plan
## Simple feature collection with 447 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 447 x 3
##     areaid name                                                         geometry
##      <int> <chr>                                              <MULTIPOLYGON [m]>
##  1 1011101 Stülerstr.      (((387256.6 5818552, 387323.1 5818572, 387418.9 5818…
##  2 1011102 Großer Tiergar… (((386767.5 5819393, 386768.3 5819389, 386769.6 5819…
##  3 1011103 Lützowstr.      (((387952.6 5818275, 387986.7 5818313, 387994.6 5818…
##  4 1011104 Körnerstr.      (((388847.1 5817875, 388855.5 5817899, 388865.1 5817…
##  5 1011105 Nördlicher Lan… (((388129.5 5819015, 388157.1 5819017, 388170.8 5819…
##  6 1011201 Wilhelmstr.     (((389845.7 5819286, 389840.9 5819311, 389846.1 5819…
##  7 1011202 Unter den Lind… (((389851.2 5819673, 389847.5 5819682, 389846.3 5819…
##  8 1011203 Linden Süd      (((390147.6 5819688, 390307.9 5819702, 390458.4 5819…
##  9 1011204 Leipziger Str.  (((390523.4 5819192, 390705.4 5819207, 390727.2 5819…
## 10 1011301 Charitéviertel  (((389526.5 5820432, 389528.2 5820499, 389528.9 5820…
## # … with 437 more rows

When printing this dataframe, the header reveals another important information: The coordinate reference system (CRS) of this dataset is ETRS89 / UTM zone 33N. We will later need to make sure that the coordinates of the school locations and the coordinates of the planning areas use the same coordinate system.

This data can be joined with socioeconomic information provided from official sources. Luckily, Helbig/Salomo 2021 compiled these information for many cities in Germany (available for download) among which is data for Berlin from 2020. I’ve created an excerpt with percentages of residents receiving social welfare (welfare) and percentage of children under 15 years whose parents receive social welfare (welfare_chld):

(bln_welfare <- read.csv('data/berlin_welfare.csv', stringsAsFactors = FALSE))
##       areaid                             areaname welfare welfare_chld
## 1    1011101                         Stülerstraße   10.09        15.44
## 2    1011102                    Großer Tiergarten    4.76         0.00
## 3    1011103                         Lützowstraße   22.21        36.80
## 4    1011104                         Körnerstraße   24.81        42.14
## 5    1011105             Nördlicher Landwehrkanal    2.82         3.53
## 6    1011201                        Wilhelmstraße   12.13        19.03
## 7    1011202                Unter den Linden Nord    3.35        11.11
## 8    1011203                 Unter den Linden Süd    6.59         6.25
## 9    1011204                     Leipziger Straße   10.28        19.91
## 10   1011301                       Charitéviertel    3.68         3.92
## 11   1011302                 Oranienburger Straße    8.16        10.44
## 12   1011303                Alexanderplatzviertel   11.69        19.64
## 13   1011304                      Karl-Marx-Allee   23.54        36.33
## 14   1011305                   Heine-Viertel West   10.07        17.00
## 15   1011306                    Heine-Viertel Ost    9.67        15.43
## 16   1011401                      Invalidenstraße    6.29         7.70
## 17   1011402                          Arkonaplatz    4.46         3.53
## 18   1022101                           Huttenkiez   32.74        66.92
## 19   1022102                          Beusselkiez   29.07        51.86
## 20   1022103                            Westhafen   57.30        60.94
## 21   1022104                       Emdener Straße   23.48        40.66
## 22   1022105                        Zwinglistraße   29.85        50.84
## 23   1022106                   Elberfelder Straße   12.41        16.22
## 24   1022201                          Stephankiez   20.31        34.06
## 25   1022202                          Heidestraße   23.48        33.33
## 26   1022203                      Lübecker Straße   36.69        49.52
## 27   1022204                      Thomasiusstraße   11.10        13.83
## 28   1022205                        Zillesiedlung   36.66        37.44
## 29   1022206                    Lüneburger Straße   12.11        13.47
## 30   1022207                         Hansaviertel    8.86        14.65
## 31   1033101                      Soldiner Straße   46.52        65.26
## 32   1033102                        Gesundbrunnen   36.76        54.68
## 33   1033201                        Brunnenstraße   45.94        54.89
## 34   1033202                     Humboldthain Süd   33.11        45.58
## 35   1033203                Humboldthain Nordwest   41.51        60.51
## 36   1044101                             Rehberge   28.63        48.26
## 37   1044102                         Schillerpark   29.37        45.61
## 38   1044103               Westliche Müllerstraße   27.26        53.81
## 39   1044201               Reinickendorfer Straße   42.91        64.20
## 40   1044202                           Sparrplatz   26.80        50.13
## 41   1044203                         Leopoldplatz   37.25        58.92
## 42   2010101                    Askanischer Platz   28.38        39.81
## 43   2010102                         Mehringplatz   41.46        51.81
## 44   2010103                          Moritzplatz   49.00        59.77
## 45   2010104                       Wassertorplatz   45.07        59.56
## 46   2020201      Gleisdreieck/Entwicklungsgebiet    4.55         6.45
## 47   2020202                  Rathaus Yorckstraße   19.39        27.30
## 48   2020203                         Viktoriapark   13.26        17.78
## 49   2020204                          Urbanstraße   14.21        19.21
## 50   2020205                         Chamissokiez   13.34        18.42
## 51   2020206                           Graefekiez   20.68        26.98
## 52   2030301                         Oranienplatz   33.43        49.48
## 53   2030302                      Lausitzer Platz   26.05        33.86
## 54   2030401                 Reichenberger Straße   20.36        28.02
## 55   2030402                          Wrangelkiez   18.89        23.93
## 56   2040501                           Barnimkiez   25.00        35.70
## 57   2040502                        Friedenstraße   14.66        23.31
## 58   2040503                Richard-Sorge-Viertel   10.18        13.98
## 59   2040701                       Andreasviertel   21.96        38.55
## 60   2040702                           Weberwiese    8.51         8.65
## 61   2040703 Wriezener Bahnhof/Entwicklungsgebiet   30.24        58.54
## 62   2050601                      Hausburgviertel   12.94        17.19
## 63   2050602                     Samariterviertel   11.66        13.34
## 64   2050801                           Traveplatz   15.85        23.06
## 65   2050802                     Boxhagener Platz    9.40        11.78
## 66   2050803                       Stralauer Kiez   11.81        18.47
## 67   2050804                  Stralauer Halbinsel    4.19         2.81
## 68   3010101                         Bucher Forst   14.40        12.61
## 69   3010102                                 Buch   28.01        31.88
## 70   3010104                        Lietzengraben   33.67        22.22
## 71   3020203                         Blankenfelde    6.66         5.57
## 72   3020209                    Niederschönhausen    7.24         6.42
## 73   3020210                          Herthaplatz   10.25        11.11
## 74   3020307                             Buchholz    8.05         9.99
## 75   3030405                           Karow Nord   16.56        20.40
## 76   3030406                            Alt-Karow    3.64         2.57
## 77   3030711                          Blankenburg    2.86         1.96
## 78   3030715                          Heinersdorf    7.55         7.46
## 79   3030716                          Märchenland    5.08         6.40
## 80   3040508                            Rosenthal    5.52         5.93
## 81   3040512                          Wilhelmsruh    7.34         9.09
## 82   3040513                            Schönholz    5.22         3.36
## 83   3040614                       Pankow Zentrum   10.59         9.21
## 84   3040818                           Pankow Süd   11.01        12.11
## 85   3050919                  Gustav-Adolf-Straße   21.93        28.80
## 86   3050920                           Weißer See   10.68         9.83
## 87   3050923                    Weißenseer Spitze   11.62        11.09
## 88   3050924                         Behaimstraße   15.55        14.53
## 89   3050925         Komponistenviertel Weißensee   13.37        13.39
## 90   3051017                       Rennbahnstraße   15.55        17.91
## 91   3051021                           Buschallee   18.16        25.76
## 92   3051022                          Hansastraße   16.28        24.96
## 93   3061126                           Arnimplatz    9.41         9.10
## 94   3061131                            Falkplatz    6.92         5.75
## 95   3061227                          Humannplatz    6.97         6.13
## 96   3061228                 Erich-Weinert-Straße   13.07        20.05
## 97   3061332                       Helmholtzplatz    8.42         7.02
## 98   3061429                  Greifswalder Straße   23.82        35.80
## 99   3061430            Volkspark Prenzlauer Berg   37.89        34.93
## 100  3061434                   Anton-Saefkow-Park   10.58         8.50
## 101  3061435                Conrad-Blenkle-Straße   16.62        24.89
## 102  3061441                      Eldenaer Straße    1.07         0.64
## 103  3071536                    Teutoburger Platz    6.52         6.98
## 104  3071537                        Kollwitzplatz    5.89         3.82
## 105  3071633                         Thälmannpark   18.75        19.77
## 106  3071638                           Winsstraße    7.13         6.37
## 107  3071639                         Bötzowstraße    6.86         4.88
## 108  4010101                        Jungfernheide   30.69        46.39
## 109  4010102                           Plötzensee   35.89        77.00
## 110  4010103                  Paul-Hertz-Siedlung   45.22        52.32
## 111  4020204                       Olympiagelände    0.00         0.00
## 112  4020205                    Siedlung Ruhleben    4.98         7.33
## 113  4020206                    Angerburger Allee    8.38        13.58
## 114  4020207                          Flatowallee    5.21         7.64
## 115  4020208                           Kranzallee    5.69         4.81
## 116  4020209                             Eichkamp    2.27         2.96
## 117  4020310                         Park Ruhwald   10.45        48.28
## 118  4020311                         Reichsstraße    7.90         9.49
## 119  4020312                      Branitzer Platz    7.10         7.51
## 120  4020313             Königin-Elisabeth-Straße   11.90        19.64
## 121  4020314                         Messegelände    0.00         0.00
## 122  4030415                         Schloßgarten   21.79        34.06
## 123  4030416                       Klausenerplatz   19.22        26.51
## 124  4030417                         Schloßstraße   12.43        17.43
## 125  4030518                          Tegeler Weg   17.00        28.10
## 126  4030519               Kaiserin-Augusta-Allee   18.67        33.19
## 127  4030620                          Alt-Lietzow   13.92        21.84
## 128  4030621                           Spreestadt   16.15        29.77
## 129  4030622                Richard-Wagner-Straße   16.41        22.80
## 130  4030623                   Ernst-Reuter-Platz   17.31        24.87
## 131  4030724                           Lietzensee   10.83        13.25
## 132  4030725                    Amtsgerichtsplatz   10.29        17.08
## 133  4030726                        Droysenstraße   13.82        24.81
## 134  4030827                    Karl-August-Platz   14.60        22.58
## 135  4030828                         Savignyplatz    8.03        13.34
## 136  4030929                       Hindemithplatz    6.68         6.98
## 137  4030930                   George-Grosz-Platz    7.08        10.07
## 138  4030931                     Breitscheidplatz   11.33        20.21
## 139  4031032                             Halensee   10.11        14.61
## 140  4041133               Güterbahnhof Grunewald    8.33         0.00
## 141  4041134                        Bismarckallee    6.95         7.60
## 142  4041135                           Hundekehle    3.64         2.47
## 143  4041136                           Hagenplatz    4.67         4.90
## 144  4041137                    Flinsberger Platz   11.36        16.41
## 145  4041238                     Kissinger Straße   11.03        11.20
## 146  4041239                  Stadion Wilmersdorf   43.46        25.00
## 147  4041240                           Messelpark    2.79         2.52
## 148  4041241                        Breite Straße    6.18         4.65
## 149  4041342                Schlangenbader Straße   15.82        18.66
## 150  4041343                        Binger Straße    8.09         8.65
## 151  4041344                    Rüdesheimer Platz    7.20         6.94
## 152  4051445                      Eisenzahnstraße   15.37        23.83
## 153  4051446                          Preußenpark    7.62         7.88
## 154  4051447                     Ludwigkirchplatz   10.88        14.89
## 155  4051448                        Schaperstraße   11.28        17.22
## 156  4051549                  Rathaus Wilmersdorf   19.50        31.79
## 157  4051550                    Leon-Jessel-Platz   11.07        17.15
## 158  4051551                      Brabanter Platz   13.60        22.38
## 159  4051652                   Nikolsburger Platz    9.92        13.09
## 160  4051653                         Prager Platz   10.36        12.13
## 161  4051654                          Wilhelmsaue   10.38        13.03
## 162  4051655                  Babelsberger Straße   12.17        13.21
## 163  4051656                      Hildegardstraße    8.19         9.86
## 164  4061757                      Forst Grunewald    0.00         0.00
## 165  5010101                      Hakenfelde Nord   18.25        24.16
## 166  5010102                          Goltzstraße   26.10        39.79
## 167  5010103                       Amorbacher Weg   14.12        18.33
## 168  5010204                     Griesingerstraße   38.94        37.30
## 169  5010205                        An der Tränke    5.26         9.01
## 170  5010206                      Gütersloher Weg   40.41        51.34
## 171  5010207                          Darbystraße   42.82        49.57
## 172  5010208                  Germersheimer Platz   36.67        50.75
## 173  5010209                         An der Kappe   16.05        22.76
## 174  5010310                           Eckschanze   31.06        43.47
## 175  5010311                            Eiswerder   39.68        48.25
## 176  5010312                            Kurstraße   43.94        59.70
## 177  5010313                          Ackerstraße   30.51        42.67
## 178  5010314                   Carl-Schurz-Straße   28.37        41.44
## 179  5010339                             Freiheit   15.40         7.84
## 180  5020415                       Isenburger Weg    3.55         4.90
## 181  5020416                         Am Heideberg   10.84        10.46
## 182  5020417                     Staakener Straße   14.25        18.41
## 183  5020418                     Spandauer Straße   21.54        23.13
## 184  5020419                        Magistratsweg   36.66        46.97
## 185  5020420                           Werkstraße   11.05        15.24
## 186  5020521                       Döberitzer Weg    7.22         8.59
## 187  5020522                       Pillnitzer Weg   52.75        60.68
## 188  5020523                        Maulbeerallee   70.34        70.10
## 189  5020524                   Weinmeisterhornweg   12.29        17.11
## 190  5020625                      Borkumer Straße   24.64        39.33
## 191  5020626                           Adamstraße   25.54        37.42
## 192  5020627                           Tiefwerder   33.66        46.07
## 193  5020628                      Graetschelsteig   27.73        37.76
## 194  5020629                     Börnicker Straße    6.45         4.69
## 195  5030730                        Zitadellenweg   18.06        23.83
## 196  5030731                  Gartenfelder Straße   33.58        46.14
## 197  5030832                             Rohrdamm   26.50        39.10
## 198  5030833                         Motardstraße   24.36        55.77
## 199  5040934                            Alt-Gatow   13.40        20.88
## 200  5040935                  Groß-Glienicker Weg   16.37         9.09
## 201  5040936                           Jägerallee    3.37         2.87
## 202  5040937                        Kladower Damm    1.63         1.16
## 203  5040938                          Kafkastraße    3.75         5.05
## 204  6010101                          Fichtenberg    4.51         4.07
## 205  6010102                           Schloßstr.   10.22        12.95
## 206  6010103                         Markelstraße    7.00         7.78
## 207  6010204                          Munsterdamm   12.59        16.09
## 208  6010205                              Südende   12.45        17.92
## 209  6010206                            Stadtpark   10.85        13.77
## 210  6010207                         Mittelstraße   11.46        16.37
## 211  6010208                           Bergstraße   10.06        11.33
## 212  6010209                      Feuerbachstraße   10.67        14.11
## 213  6010210                       Bismarckstraße   11.84        15.31
## 214  6020301                         Alt-Lankwitz   12.56        16.70
## 215  6020302          Komponistenviertel Lankwitz    7.81        11.13
## 216  6020303                      Lankwitz Kirche   17.50        24.18
## 217  6020304                Kaiser-Wilhelm-Straße   16.52        22.69
## 218  6020305                Gemeindepark Lankwitz   17.53        23.38
## 219  6020306                         Lankwitz Süd    8.76        12.53
## 220  6020407                  Thermometersiedlung   31.20        42.81
## 221  6020408                     Lichterfelde Süd   17.84        24.09
## 222  6020409                  Königsberger Straße   11.13        15.72
## 223  6020410                      Oberhofer Platz    7.12         9.52
## 224  6020411                  Schütte-Lanz-Straße   12.80        19.76
## 225  6030501                        Berlepschstr.    8.49        11.80
## 226  6030502                       Zehlendorf Süd   13.19        18.42
## 227  6030503                     Zehlendorf Mitte    7.24         9.09
## 228  6030504                        Teltower Damm    4.68         4.66
## 229  6030605                   Botanischer Garten    8.38         9.13
## 230  6030606                       Hindenburgdamm   12.71        15.92
## 231  6030607                           Goerzwerke    7.26        10.99
## 232  6030608                    Schweizer Viertel    6.25         6.06
## 233  6030609                         Augustaplatz    9.41        11.81
## 234  6030610                    Lichterfelde West    4.63         4.54
## 235  6040701                              Wannsee    4.39         4.30
## 236  6040702                               Düppel    5.41         5.45
## 237  6040703                           Nikolassee    3.74         3.64
## 238  6040804                         Krumme Lanke    2.48         1.86
## 239  6040805                  Fischerhüttenstraße    4.44         4.37
## 240  6040806                             Fischtal    4.75         4.94
## 241  6040807                     Zehlendorf Eiche    7.13         6.60
## 242  6040808                            Hüttenweg    2.68         3.12
## 243  6040809                           Thielallee    1.48         1.54
## 244  6040810                               Dahlem    3.47         4.02
## 245  7010101 Wittenbergplatz/Viktoria-Luise-Platz   14.33        20.39
## 246  7010102                      Nollendorfplatz   26.83        44.04
## 247  7010103                      Barbarossaplatz   10.93        14.90
## 248  7010104                       Dennewitzplatz   26.50        38.59
## 249  7020201                    Bayerischer Platz   10.89        12.90
## 250  7020202        Volkspark (Rudolf-Wilde-Park)   17.28        24.47
## 251  7020203                 Kaiser-Wilhelm-Platz   17.37        24.73
## 252  7020204                   Schöneberger Insel   12.07        16.71
## 253  7030301                            Friedenau    7.06         7.92
## 254  7030302                       Ceciliengärten    9.13        11.98
## 255  7030303                         Grazer Platz   18.59        24.74
## 256  7040401                        Neu-Tempelhof   17.99        22.31
## 257  7040402                    Lindenhofsiedlung   20.16        24.81
## 258  7040403                     Manteuffelstraße   20.16        30.32
## 259  7040404                           Marienhöhe   16.11        23.61
## 260  7040405                    Rathaus Tempelhof   24.68        37.23
## 261  7040406                       Germaniagarten   38.72        58.14
## 262  7050501                        Rathausstraße   25.41        37.59
## 263  7050502                  Fritz-Werner-Straße   24.53        35.64
## 264  7050503                    Eisenacher Straße   19.60        29.35
## 265  7050504                            Imbrosweg   22.10        27.74
## 266  7050505                         Hundsteinweg   17.54        26.02
## 267  7050506                          Birnhornweg    7.37        12.31
## 268  7060601          Marienfelder Allee Nordwest   35.40        44.05
## 269  7060602                          Kirchstraße    7.52         8.96
## 270  7060603                  Marienfelde Nordost   28.51        40.20
## 271  7060604                      Marienfelde Süd   29.00        37.41
## 272  7070701      Kettinger Straße/Schillerstraße   12.47        16.64
## 273  7070702        Alt-Lichtenrade/Töpchiner Weg   13.81        17.88
## 274  7070703                    John-Locke-Straße   24.78        31.27
## 275  7070704                       Nahariyastraße   47.62        58.93
## 276  7070705           Franziusweg/Rohrbachstraße    7.24         7.76
## 277  7070706  Horstwalder Straße/Paplitzer Straße   17.60        23.37
## 278  7070707                  Wittelsbacherstraße   10.32        18.35
## 279  8010115                           Hasenheide   15.35        13.28
## 280  8010116                       Wissmannstraße   29.79        46.82
## 281  8010117                    Schillerpromenade   25.24        39.61
## 282  8010118                    Silbersteinstraße   39.28        60.71
## 283  8010211                      Flughafenstraße   27.12        46.12
## 284  8010212                             Rollberg   46.16        55.59
## 285  8010213                           Körnerpark   30.46        49.66
## 286  8010214                      Glasower Straße   38.53        59.05
## 287  8010301                           Reuterkiez   20.43        32.23
## 288  8010302                         Bouchéstraße   28.29        47.86
## 289  8010303                          Donaustraße   30.07        54.30
## 290  8010404                              Rixdorf   28.64        43.96
## 291  8010405                       Hertzbergplatz   24.77        41.04
## 292  8010406                Treptower Straße Nord   48.13        68.40
## 293  8010407             Gewerbegebiet Ederstraße   24.04        47.83
## 294  8010508                       Weiße Siedlung   63.14        66.34
## 295  8010509                      Schulenburgpark   70.79        74.87
## 296  8010510       Gewerbegebiet Köllnische Heide   39.33        77.19
## 297  8020619                  Buschkrugallee Nord   37.91        45.98
## 298  8020620                      Tempelhofer Weg   32.98        48.68
## 299  8020621                  Mohriner Allee Nord    3.97         6.17
## 300  8020622                      Parchimer Allee   20.29        28.99
## 301  8020623                           Ortolanweg   25.36        46.03
## 302  8020624                       Britzer Garten   12.83         9.88
## 303  8020625                  Handwerker-Siedlung   16.59        20.77
## 304  8020726                          Buckow West   13.45        19.39
## 305  8020727                         Buckow Mitte   19.90        29.12
## 306  8020728                           Buckow Ost   32.91        47.07
## 307  8030829                    Gropiusstadt Nord   42.21        55.63
## 308  8030830                     Gropiusstadt Süd   28.84        45.50
## 309  8030831                     Gropiusstadt Ost   37.54        45.67
## 310  8040932                      Goldhähnchenweg   38.36        46.10
## 311  8040933                     Vogelviertel Süd   14.84        26.07
## 312  8040934                    Vogelviertel Nord   16.39        22.10
## 313  8041035                        Blumenviertel    9.85        13.53
## 314  8041036                      Zittauer Straße    8.24        11.18
## 315  8041037                            Alt-Rudow   18.60        28.74
## 316  8041038              Waßmannsdorfer Chaussee    7.83        10.08
## 317  8041039                        Frauenviertel   12.98        18.70
## 318  8041040           Waltersdorfer Chaussee Ost   15.02        21.27
## 319  9010101                          Elsenstraße   16.28        16.82
## 320  9010102               Am Treptower Park Nord    5.56         0.00
## 321  9010201                Am Treptower Park Süd    9.33         9.20
## 322  9010202                Köpenicker Landstraße   17.27        30.83
## 323  9010301                    Baumschulenstraße   13.95        19.07
## 324  9010302                          Späthsfelde    4.80         7.28
## 325  9010401                    Johannisthal West   13.33        20.24
## 326  9010402                     Johannisthal Ost   12.38        15.36
## 327  9020501                 Oberschöneweide West   27.83        42.76
## 328  9020502                  Oberschöneweide Ost   22.91        31.00
## 329  9020601                      Schnellerstraße   23.90        37.70
## 330  9020602                            Oberspree   15.39        22.46
## 331  9020701                       Adlershof West    3.67         6.74
## 332  9020702                        Adlershof Ost   16.31        23.19
## 333  9020801                        Spindlersfeld   14.30        16.82
## 334  9020802                  Köllnische Vorstadt   30.08        41.05
## 335  9030901                    Dorf Altglienicke    6.53         8.37
## 336  9030902                        Wohngebiet II   36.92        48.77
## 337  9030903                       Kölner Viertel   20.39        24.62
## 338  9031001                            Bohnsdorf    9.28        11.90
## 339  9031101                               Grünau    8.80        10.39
## 340  9031201                         Karolinenhof    2.01         1.73
## 341  9031202         Schmöckwitz/Rauchfangswerder    6.25         9.47
## 342  9041301              Kietzer Feld/Nachtheide   12.77        16.75
## 343  9041302                         Wendenschloß    3.43         2.91
## 344  9041401                            Allende I   14.08        20.39
## 345  9041402                           Allende II   11.36        16.07
## 346  9041501                       Altstadt Kietz   25.34        32.20
## 347  9041601                           Müggelheim    3.68         4.54
## 348  9051701                         Hirschgarten   16.55        14.02
## 349  9051702                        Bölschestraße    7.63         7.58
## 350  9051801               Rahnsdorf/Hessenwinkel    5.89         5.67
## 351  9051901                         Dammvorstadt   11.40        11.89
## 352  9052001                        Köpenick Nord    9.21        10.06
## 353 10010101                         Marzahn West   33.42        43.71
## 354 10010102                       Havemannstraße   33.58        42.03
## 355 10010203    Gewerbegebiet Bitterfelder Straße   76.03        58.85
## 356 10010204                       Wuhletalstraße   33.00        40.88
## 357 10010205                          Marzahn Ost   29.61        41.48
## 358 10010206                       Ringkolonnaden   27.07        38.06
## 359 10010207                  Marzahner Promenade   27.61        35.75
## 360 10010308                   Marzahner Chaussee   13.08         9.15
## 361 10010309                          Springpfuhl   21.73        30.45
## 362 10010310                          Alt-Marzahn   21.34        28.01
## 363 10010311                      Landsberger Tor   11.49        15.83
## 364 10020412            Alte Hellersdorfer Straße   45.49        47.53
## 365 10020413                      Gut Hellersdorf   29.95        41.39
## 366 10020414                          Helle Mitte   26.51        38.08
## 367 10020415              Hellersdorfer Promenade   44.66        56.35
## 368 10020416                      Böhlener Straße   36.26        46.69
## 369 10020517                Adele-Sandrock-Straße   16.02        24.82
## 370 10020518                          Schleipfuhl   32.29        39.18
## 371 10020519             Boulevard Kastanienallee   41.37        47.02
## 372 10020620                              Nord II   20.80        37.08
## 373 10020621                       Gelbes Viertel   38.55        45.76
## 374 10020622                               Nord I   16.96        25.36
## 375 10020623                        Rotes Viertel   16.84        24.12
## 376 10030724                       Oberfeldstraße    2.05         2.15
## 377 10030725                        Buckower Ring   16.61        18.03
## 378 10030726                         Alt-Biesdorf    8.25        10.95
## 379 10030727                         Biesdorf Süd    1.82         1.99
## 380 10040828                       Kaulsdorf Nord    7.15         9.50
## 381 10040829                        Alt-Kaulsdorf    7.86         6.92
## 382 10040830                        Kaulsdorf Süd    2.22         2.19
## 383 10040931                       Mahlsdorf Nord    2.32         1.63
## 384 10040932                        Alt-Mahlsdorf    3.49         2.80
## 385 10040933                        Mahlsdorf Süd    2.45         2.52
## 386 11010101                         Dorf Malchow    8.01        12.86
## 387 11010102                      Dorf Wartenberg    2.82         3.24
## 388 11010103                      Dorf Falkenberg    8.42        10.84
## 389 11010204                       Falkenberg Ost   37.94        46.97
## 390 11010205                      Falkenberg West   28.28        38.82
## 391 11010206                       Wartenberg Süd   28.39        40.26
## 392 11010207                      Wartenberg Nord   30.61        38.63
## 393 11010308                  Zingster Straße Ost   31.77        42.14
## 394 11010309                 Zingster Straße West   36.65        49.50
## 395 11010310                          Mühlengrund   18.38        23.14
## 396 11020411                        Malchower Weg   23.65        31.29
## 397 11020412                          Hauptstraße   21.48        27.40
## 398 11020513                            Orankesee    6.21         6.75
## 399 11020514                   Große-Leege-Straße   21.81        33.59
## 400 11020515                    Landsberger Allee   23.63        31.69
## 401 11020516                          Weiße Taube    4.31         5.16
## 402 11030617            Hohenschönhausener Straße   32.21        48.84
## 403 11030618                       Fennpfuhl West   17.66        29.06
## 404 11030619                        Fennpfuhl Ost   24.97        34.11
## 405 11030720                       Herzbergstraße   12.65        27.15
## 406 11030721                        Rüdigerstraße   15.62        20.43
## 407 11030824                Frankfurter Allee Süd   22.40        36.25
## 408 11040925                        Victoriastadt   10.41        10.24
## 409 11040926                       Weitlingstraße   16.90        22.82
## 410 11041022                     Rosenfelder Ring   24.72        45.93
## 411 11041023                     Gensinger Straße   26.01        33.02
## 412 11041027                             Tierpark   26.05        36.05
## 413 11041128                          Sewanstraße   22.21        34.75
## 414 11051229                          Rummelsburg   13.94        11.26
## 415 11051330                      Karlshorst West    8.20        10.01
## 416 11051331                      Karlshorst Nord    7.15         5.13
## 417 11051332                       Karlshorst Süd    3.73         2.49
## 418 12103115                      Breitkopfbecken   31.29        47.23
## 419 12103116                       Hausotterplatz   33.99        47.45
## 420 12103117                           Letteplatz   33.62        48.26
## 421 12103218                          Teichstraße   29.63        44.70
## 422 12103219                           Schäfersee   32.04        49.91
## 423 12103220                       Humboldtstraße   19.78        30.65
## 424 12214125               Waldidyll/Flughafensee   12.37        20.60
## 425 12214126                            Tegel Süd   32.95        41.26
## 426 12214421                        Reinickes Hof   29.69        44.41
## 427 12214422                           Klixstraße   46.78        57.41
## 428 12214423                          Mellerbogen   26.43        37.80
## 429 12214424                    Scharnweberstraße   39.24        56.57
## 430 12214527                            Alt-Tegel   11.99        18.68
## 431 12214528                        Tegeler Forst    4.07         0.00
## 432 12224229                 Konradshöhe/Tegelort    4.27         4.71
## 433 12224230                          Heiligensee    4.98         5.37
## 434 12231101                            Hermsdorf    5.16         5.31
## 435 12231102                              Frohnau    4.04         4.38
## 436 12301203                         Wittenau Süd   22.00        31.25
## 437 12301204                        Wittenau Nord   11.88        19.07
## 438 12301205                        Waidmannslust    9.12         9.02
## 439 12301206                               Lübars    4.74         4.09
## 440 12302107                    Schorfheidestraße   15.33        25.65
## 441 12302108                   Märkisches Zentrum   48.78        56.83
## 442 12302109              Treuenbrietzener Straße   56.29        58.97
## 443 12302110                     Dannenwalder Weg   51.87        55.38
## 444 12302211                      Lübarser Straße   14.85        20.74
## 445 12302212                    Rollbergesiedlung   60.85        65.68
## 446 12304313                          Borsigwalde   18.18        30.41
## 447 12304314           Ziekowstraße/Freie Scholle   11.64        15.06

We can use the area ID for augmenting the planning areas with the welfare statistics. We’re joining a spatial with an ordinary dataframe, so we can use dplyr’s inner_join. Before that we can check that for each planning region we have welfare statistics information and vice versa:

setequal(bln_plan$areaid, bln_welfare$areaid)
## [1] TRUE
(bln <- inner_join(bln_plan, bln_welfare, by = 'areaid') %>%
  select(-name))
## Simple feature collection with 447 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 447 x 5
##     areaid                             geometry areaname    welfare welfare_chld
##      <int>                   <MULTIPOLYGON [m]> <chr>         <dbl>        <dbl>
##  1 1011101 (((387256.6 5818552, 387323.1 58185… Stülerstra…   10.1         15.4 
##  2 1011102 (((386767.5 5819393, 386768.3 58193… Großer Tie…    4.76         0   
##  3 1011103 (((387952.6 5818275, 387986.7 58183… Lützowstra…   22.2         36.8 
##  4 1011104 (((388847.1 5817875, 388855.5 58178… Körnerstra…   24.8         42.1 
##  5 1011105 (((388129.5 5819015, 388157.1 58190… Nördlicher…    2.82         3.53
##  6 1011201 (((389845.7 5819286, 389840.9 58193… Wilhelmstr…   12.1         19.0 
##  7 1011202 (((389851.2 5819673, 389847.5 58196… Unter den …    3.35        11.1 
##  8 1011203 (((390147.6 5819688, 390307.9 58197… Unter den …    6.59         6.25
##  9 1011204 (((390523.4 5819192, 390705.4 58192… Leipziger …   10.3         19.9 
## 10 1011301 (((389526.5 5820432, 389528.2 58204… Charitévie…    3.68         3.92
## # … with 437 more rows

Note that when joining spatial and ordinary dataframes, the order of arguments in the join function matters. If you have a spatial dataframe on the “left side” (x argument), the result will be a spatial dataframe. If you have an ordinary dataframe on the left side, the result will be an ordinary dataframe, i.e. the merged dataset loses its “spatial nature” and spatial operations won’t work with it any more (unless you convert it back to a spatial dataframe again with st_as_sf).

inner_join(bln_plan, bln_welfare, by = 'areaid') %>% class()
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"
inner_join(bln_welfare, bln_plan, by = 'areaid') %>% class()
## [1] "data.frame"

A quick plot confirms that it is similar to the one from the dashboard of the Helbig/Salomo study. I prefer using the base plot function for quick exploration of spatial data and usually only turn to ggplot2 for more advanced or “publication ready” plots. The help page for plot.sf provides some information about the arguments of this plotting function used for sf objects.

plot(bln['welfare_chld'])

hist(bln$welfare_chld,
     main = 'Histogram of percentage of children under 15 years\nwhose parents receive social welfare',
     xlab = '')

median(bln$welfare_chld)
## [1] 20.39
IQR(bln$welfare_chld)
## [1] 28.725

Public and private primary schools

Marcel Helbig, Rita Nikolai and me collected data on school locations in East Germany from 1992 to 2015 in order to analyze the development of the network of schools in East Germany and which role private schools play in it. We created an interactive map and published the data and are planning an update with newer data (until 2020), from which will we use an excerpt. This dataset provides school locations from 2019 as longitude/latitude WGS84 coordinates which we can load and convert into a spatial dataset using st_as_sf.

(schools <- read.csv('data/grundschulen_berlin_2019.csv', stringsAsFactors = FALSE) %>%
  st_as_sf(coords = c('lng', 'lat'), crs = 4326))  # EPSG 4326 is WGS84 lat/long coord.
## Simple feature collection with 435 features and 5 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 13.12773 ymin: 52.37591 xmax: 13.70305 ymax: 52.63857
## Geodetic CRS:  WGS 84
## First 10 features:
##    traeger priv_schule_typ                             name          ort   plz
## 1     oeff                       Grundschule am Arkonaplatz Berlin Mitte 10115
## 2     oeff                             Papageno-Grundschule Berlin Mitte 10115
## 3     oeff                        Kastanienbaum-Grundschule Berlin Mitte 10119
## 4     oeff                            Grundschule Neues Tor Berlin Mitte 10115
## 5     oeff                            GutsMuths-Grundschule Berlin Mitte 10179
## 6     oeff                 Grundschule am Brandenburger Tor Berlin Mitte 10117
## 7     oeff                                 City-Grundschule Berlin Mitte 10179
## 8     oeff                       Kurt-Tucholsky-Grundschule Berlin Mitte 10559
## 9     oeff                           Anne-Frank-Grundschule Berlin Mitte 10557
## 10    oeff                             Moabiter Grundschule Berlin Mitte 10557
##                     geometry
## 1  POINT (13.40039 52.53684)
## 2  POINT (13.39117 52.53274)
## 3   POINT (13.4018 52.52678)
## 4  POINT (13.38033 52.52759)
## 5  POINT (13.42395 52.51642)
## 6  POINT (13.38287 52.51312)
## 7  POINT (13.40892 52.50737)
## 8  POINT (13.35303 52.53026)
## 9  POINT (13.35537 52.51854)
## 10 POINT (13.35657 52.52197)

The variable traeger encodes whether a given facility is public (“oeff”) or private (“priv”) primary school. The only other important feature will be the school’s location, represented as point geometries that use the WGS84 CRS. The Berlin planning area data uses a different CRS, namely ETRS89 / UTM zone 33N. We need to make sure that both datasets use the same CRS, otherwise spatial operations such as spatial joins will fail. WGS84 uses spherical coordinates measured in degrees and calculations with these coordinates are quite complex because they happen on a curved surface. It’s better to use a CRS that uses planar coordinates for all the following spatial operations. The ETRS89 CRS used in the Berlin planning area data uses planar coordinates measured in meters, so we should transform the school locations to this CRS using st_transform:

schools <- mutate(schools, schoolid = 1:nrow(schools), .before = 1) %>%
  st_transform(crs = st_crs(bln_plan))
schools
## Simple feature collection with 435 features and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 372830.3 ymin: 5803709 xmax: 411830.8 ymax: 5833408
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
##    schoolid traeger priv_schule_typ                             name
## 1         1    oeff                       Grundschule am Arkonaplatz
## 2         2    oeff                             Papageno-Grundschule
## 3         3    oeff                        Kastanienbaum-Grundschule
## 4         4    oeff                            Grundschule Neues Tor
## 5         5    oeff                            GutsMuths-Grundschule
## 6         6    oeff                 Grundschule am Brandenburger Tor
## 7         7    oeff                                 City-Grundschule
## 8         8    oeff                       Kurt-Tucholsky-Grundschule
## 9         9    oeff                           Anne-Frank-Grundschule
## 10       10    oeff                             Moabiter Grundschule
##             ort   plz                 geometry
## 1  Berlin Mitte 10115   POINT (391508 5821952)
## 2  Berlin Mitte 10115 POINT (390872.9 5821510)
## 3  Berlin Mitte 10119 POINT (391578.9 5820831)
## 4  Berlin Mitte 10115 POINT (390124.5 5820954)
## 5  Berlin Mitte 10179 POINT (393056.4 5819646)
## 6  Berlin Mitte 10117 POINT (390260.6 5819340)
## 7  Berlin Mitte 10179 POINT (392014.3 5818662)
## 8  Berlin Mitte 10559 POINT (388279.4 5821292)
## 9  Berlin Mitte 10557 POINT (388408.4 5819986)
## 10 Berlin Mitte 10557 POINT (388498.5 5820365)

Public / private primary schools and poverty by statistical region

Both datasets use the same coordinate system now, so we can plot the school locations on top of the planning areas. I will use ggplot2 this time to make a choropleth map of the welfare_chld variable and overlay that with the public and private primary school locations.

ggplot() +
  geom_sf(aes(fill = welfare_chld), color = 'black', data = bln) +
  geom_sf(aes(color = traeger), size = 1, alpha = 0.75, data = schools) +
  scale_fill_binned(type = 'viridis', guide = guide_bins(title = '% Welfare')) +
  scale_color_manual(values = c('oeff' = '#8C2F92',  'priv' = '#928C2F'),
                     labels = c('public school', 'private school'),
                     guide = guide_legend(title = '')) +
  coord_sf(datum = NA) +  # disable graticule
  labs(title = "Public / private primary schools and poverty",
       subtitle = "Choropleth map of percentage of children whose parents obtain social welfare.\nDots represent primary schools.") +
  theme_minimal()

From the figure alone, it’s probably hard to assess whether there’s a pattern in the distribution of private and public schools regarding areas with higher percentage of social welfare recipients in the city. We can join the schools’ data with the socioeconomic information of the planning area they’re located in order to compare the social welfare statistics of regions around private schools with those around public schools. This can be done with a spatial join using st_join. By default, this function joins the spatial features of the first argument with features of the second argument when they intersect – in our case this means a school is linked with the planning area it’s located in. Note that the order of arguments matters here and that the spatial geometry of the first argument is retained in the resulting dataset.

(schools_plan <- st_join(schools, bln))
## Simple feature collection with 435 features and 10 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 372830.3 ymin: 5803709 xmax: 411830.8 ymax: 5833408
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
##    schoolid traeger priv_schule_typ                             name
## 1         1    oeff                       Grundschule am Arkonaplatz
## 2         2    oeff                             Papageno-Grundschule
## 3         3    oeff                        Kastanienbaum-Grundschule
## 4         4    oeff                            Grundschule Neues Tor
## 5         5    oeff                            GutsMuths-Grundschule
## 6         6    oeff                 Grundschule am Brandenburger Tor
## 7         7    oeff                                 City-Grundschule
## 8         8    oeff                       Kurt-Tucholsky-Grundschule
## 9         9    oeff                           Anne-Frank-Grundschule
## 10       10    oeff                             Moabiter Grundschule
##             ort   plz  areaid             areaname welfare welfare_chld
## 1  Berlin Mitte 10115 1011402          Arkonaplatz    4.46         3.53
## 2  Berlin Mitte 10115 1011401      Invalidenstraße    6.29         7.70
## 3  Berlin Mitte 10119 1011302 Oranienburger Straße    8.16        10.44
## 4  Berlin Mitte 10115 1011301       Charitéviertel    3.68         3.92
## 5  Berlin Mitte 10179 1011304      Karl-Marx-Allee   23.54        36.33
## 6  Berlin Mitte 10117 1011201        Wilhelmstraße   12.13        19.03
## 7  Berlin Mitte 10179 1011305   Heine-Viertel West   10.07        17.00
## 8  Berlin Mitte 10559 1022201          Stephankiez   20.31        34.06
## 9  Berlin Mitte 10557 1022206    Lüneburger Straße   12.11        13.47
## 10 Berlin Mitte 10557 1022206    Lüneburger Straße   12.11        13.47
##                    geometry
## 1    POINT (391508 5821952)
## 2  POINT (390872.9 5821510)
## 3  POINT (391578.9 5820831)
## 4  POINT (390124.5 5820954)
## 5  POINT (393056.4 5819646)
## 6  POINT (390260.6 5819340)
## 7  POINT (392014.3 5818662)
## 8  POINT (388279.4 5821292)
## 9  POINT (388408.4 5819986)
## 10 POINT (388498.5 5820365)

We can see that the schools’ data was linked with the data from the planning areas. We should also check whether there’s a school that was not located in any planning area (this may for example happen when point locations are very close to a border):

sum(is.na(schools_plan$areaid))
## [1] 0

All schools were linked with their planning region, so we can now compare the percentage of children whose parents obtain social welfare between public and private schools:

ggplot(schools_plan) +
  geom_violin(aes(x = traeger, y = welfare_chld), draw_quantiles = c(0.5)) +
  geom_jitter(aes(x = traeger, y = welfare_chld), alpha = 0.25) +
  scale_x_discrete(labels = c('oeff' = 'public primary schools', 'priv' = 'private primary schools')) +
  labs(title = 'Percentage of children whose parents obtain social welfare', x = '', y = '% welfare')

Our descriptive results indicate that the median percentage of children whose parents obtain social welfare is around six percent higher in the statistical regions around public schools than around private schools:

st_drop_geometry(schools_plan) %>%
  group_by(traeger) %>%
  summarise(median_welfare_chld = median(welfare_chld))
## # A tibble: 2 x 2
##   traeger median_welfare_chld
##   <chr>                 <dbl>
## 1 oeff                   22.8
## 2 priv                   16.8

This is an interesting descriptive result and we may continue with our spatial analysis from here. However, our current approach doesn’t consider the catchment area of a school correctly: Children from nearby planning areas will most likely visit a school but at the moment we only consider the one planning area in which a school is located. As an example, let’s zoom to school #391 “Evangelische Schule Berlin Buch” in the north of Berlin. As you can see, only considering the planning area in which this school is located omits the higher welfare rates in nearby areas:

ggplot(bln) +
  geom_sf(aes(fill = welfare_chld), color = 'black') +
  geom_sf(data = filter(schools, schoolid == 391), size  = 3, color = 'red') +
  scale_fill_binned(type = 'viridis', guide = guide_bins(title = '% Welfare')) +
  coord_sf(datum = st_crs(bln), xlim = c(395e3, 401e3), ylim = c(583e4, 5837e3)) +
  labs(title = 'School #391 "Evangelische Schule Berlin Buch" and\nsurrounding planning areas') +
  theme_minimal()

Using Voronoi regions as catchment areas

We can improve on this by using each schools’ catchment area and then taking the weighted average of the welfare variable from the intersections with the planning areas. You may be able to get spatial data on school catchment areas from administrative sources, but this is not often the case. So how could we define such a catchment area? One possibility would be to construct a circle around each school which represents the catchment area for a certain radius. However, it’s hard to justify a certain value for that radius and the radius for such a catchment area should probably vary depending on where the school is located (smaller catchment areas in inner city schools than for schools in the outskirts).

Another approach relies on Voronoi regions. They partition the space between given points so that the Voronoi region around each point covers an area of minimal distance to that origin point. In other words: the Voronoi region around a school is the area in which all the households are located that are closest to that school. It’s a reasonable assumption that catchment areas for schools can be approximated by Voronoi regions in Berlin since children are by law assigned to schools near their place of residence. Parents can try to register their children at another school, but priority is given to children that live closer to that school.

Voronoi regions can be generated with st_voronoi, which accepts the points as MULTIPOINT geometry object. The second argument is an envelope polygon for which we’ll use the the Berlin borders. The resulting object is a GEOMETRYCOLLECTION geometry object which we pass on to st_collection_extract and st_sfc in order to transform this to geometry set object that has the same CRS as our other spatial data (ETRS89).

bln_outline <- st_union(bln$geometry)  # Berlin borders

(voronois <- st_multipoint(st_coordinates(schools$geometry)) %>%
  st_voronoi(bln_outline) %>%
  st_collection_extract() %>%
  st_sfc(crs = st_crs(bln)))
## Geometry set for 435 features 
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 333829.7 ymin: 5764709 xmax: 450831.3 ymax: 5872408
## Projected CRS: ETRS89 / UTM zone 33N
## First 5 geometries:
## POLYGON ((333829.7 5824739, 333829.7 5843705, 3...
## POLYGON ((333829.7 5805022, 333829.7 5824739, 3...
## POLYGON ((373574.4 5820108, 374364.5 5821773, 3...
## POLYGON ((360399.8 5809719, 373736.3 5814453, 3...
## POLYGON ((373887.8 5764709, 333829.7 5764709, 3...

We can now plot the generated regions along with the school locations:

{
  plot(bln_outline)
  plot(schools$geometry, col = 'blue', pch = 19, add = TRUE)
  plot(voronois, border = 'red', col = NA, add = TRUE)
}

We can see that the Voronoi regions extend beyond the borders of Berlin so we should take the intersection between the Voronoi regions and the Berlin border in order to “cut” these regions:

bln_vor <- st_intersection(voronois, bln_outline)

{
  plot(bln_vor, border = 'red', col = NA)
  plot(schools$geometry, col = 'blue', pch = 19, add = TRUE)
}

Let’s overlay the planning areas with the schools’ Voronoi regions to see how they differ. Blue dots represent the schools.

{
  plot(bln$geometry)
  plot(bln_vor, col = '#80000055', border = 'white', add = TRUE)
  plot(schools$geometry, col = 'blue', pch = 19, cex = 0.25, add = TRUE)
}

The goal is now to calculate the weighted average of the welfare rate for a given school by taking into account all planning areas that the school’s Voronoi region intersects with. The weights will be determined by the intersection area between the Voronoi region and the planning areas. I will first do this with a single school only to illustrate how it works. This school will be #270 “Müggelheimer Schule” located in the south east:

(exampleschool <- schools[schools$schoolid == 270,])
## Simple feature collection with 1 feature and 6 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 409354.2 ymin: 5807970 xmax: 409354.2 ymax: 5807970
## Projected CRS: ETRS89 / UTM zone 33N
##     schoolid traeger priv_schule_typ                              name
## 270      270    oeff                 Müggelheimer Schule (Grundschule)
##                         ort   plz                 geometry
## 270 Berlin Treptow-Köpenick 12559 POINT (409354.2 5807970)
{
  plot(bln$geometry)
  plot(bln_vor, col = '#80000055', border = 'white', add = TRUE)
  plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}

First, we need the Voronoi region of that school. We can again apply st_join for this in order to get the Voronoi region that intersects with the school. st_join only works with spatial dataframes so we need to convert the geometry set object via st_as_sf. Note that the Voronoi regions should be the first argument in the st_join function since we want to retain the region’s geometry in the resulting dataset. We also use an inner join instead of a left join by setting left = FALSE so that the result set only contains the single Voronoi region that intersects with the school.

(examplevor <- st_join(st_as_sf(bln_vor), exampleschool, left = FALSE))
## Simple feature collection with 1 feature and 6 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
##   schoolid traeger priv_schule_typ                              name
## 1      270    oeff                 Müggelheimer Schule (Grundschule)
##                       ort   plz                       geometry
## 1 Berlin Treptow-Köpenick 12559 POLYGON ((406269.6 5806871,...

We can see that the extracted Voronoi region seems to be correct:

{
  plot(examplevor$geometry)
  plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}

The next step is to get the intersections between the planning areas and Voronoi regions, i.e. “to cut” the planning areas according to the school’s Voronoi region. We do this with the help of st_intersection, which performs the intersection between spatial objects. The result is a spatial dataframe of the six planning regions that overlap with the school’s Voronoi region:

(exampleareas <- st_intersection(bln, examplevor))
## Simple feature collection with 6 features and 10 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 6 x 11
##    areaid areaname  welfare welfare_chld schoolid traeger priv_schule_typ name  
## *   <int> <chr>       <dbl>        <dbl>    <int> <chr>   <chr>           <chr> 
## 1 9031101 Grünau       8.8         10.4       270 oeff    ""              Mügge…
## 2 9031201 Karoline…    2.01         1.73      270 oeff    ""              Mügge…
## 3 9031202 Schmöckw…    6.25         9.47      270 oeff    ""              Mügge…
## 4 9041301 Kietzer …   12.8         16.8       270 oeff    ""              Mügge…
## 5 9041601 Müggelhe…    3.68         4.54      270 oeff    ""              Mügge…
## 6 9051801 Rahnsdor…    5.89         5.67      270 oeff    ""              Mügge…
## # … with 3 more variables: ort <chr>, plz <int>, geometry <GEOMETRY [m]>
{
  plot(exampleareas$geometry)
  plot(exampleschool$geometry, col = 'red', pch = 19, add = TRUE)
}

We can but that a little bit into perspective again and display this on the Berlin planning regions map overlayed with the schools’ Voronoi regions:

ggplot() +
  geom_sf(color = 'black', data = bln) +
  geom_sf(fill = NA, color = 'red', linetype = 'dotted', data = bln_vor) +
  geom_sf(aes(fill = welfare_chld), color = 'black', data = exampleareas) +
  geom_sf(fill = NA, color = 'red', data = examplevor) +
  geom_sf(color = 'red', size = 3, data = exampleschool) +
  scale_fill_continuous(type = 'viridis', guide = guide_colorbar(title = '% Welfare')) +
  coord_sf(datum = NA) +
  labs(title = "Berlin statistical regions and Voronoi regions of schools",
       subtitle = "Highlighted school #270 with surrounding Voronoi region and planning areas intersection.") +
  theme_minimal()

All that is left now for our example school is to take the weighted average of the welfare rate. The weights are the area of the planning area intersections so that planning areas with larger overlap in the Voronoi region have a higher influence on the overall average. The following shows the planning area intersections along with their area as calculated via st_area. We can see that the welfare rate of ~5% in Müggelheim will have the largest weight, followed by the ~17% rate in Kietzer Feld/Nachtheide:

cbind(exampleareas[c('areaname', 'welfare_chld')], area = st_area(exampleareas))
## Simple feature collection with 6 features and 3 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 406269.6 ymin: 5805035 xmax: 413198.9 ymax: 5810714
## Projected CRS: ETRS89 / UTM zone 33N
##                       areaname welfare_chld              area
## 1                       Grünau        10.39   318930.01 [m^2]
## 2                 Karolinenhof         1.73    15937.56 [m^2]
## 3 Schmöckwitz/Rauchfangswerder         9.47   757552.06 [m^2]
## 4      Kietzer Feld/Nachtheide        16.75  6117595.30 [m^2]
## 5                   Müggelheim         4.54 13633749.67 [m^2]
## 6       Rahnsdorf/Hessenwinkel         5.67   216763.29 [m^2]
##                         geometry
## 1 POLYGON ((406451.3 5807406,...
## 2 MULTIPOLYGON (((406681.9 58...
## 3 POLYGON ((410293.6 5805317,...
## 4 POLYGON ((408799.1 5810501,...
## 5 POLYGON ((406890.4 5806615,...
## 6 MULTIPOLYGON (((410155.4 58...

We pass these area measurements to weighted.mean (stripping the m² unit via as.numeric since weighted.mean can’t handle it) and obtain a weighted average welfare rate of ~8.4% which is quite a bit higher than the ~4.5% we get when using the former approach (linking the school with its planning area “Müggelheim”):

weighted.mean(exampleareas$welfare_chld, as.numeric(st_area(exampleareas)))
## [1] 8.362149
# former approach: linking the school with its planning area
schools_plan[schools_plan$schoolid == 270, ]$welfare_chld
## [1] 4.54

We’ll next perform these calculations for all schools. First, we find the school inside each Voronoi region using st_join as before:

(school_vor <- st_join(st_as_sf(bln_vor), schools))
## Simple feature collection with 435 features and 6 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
##    schoolid traeger priv_schule_typ
## 1       140    oeff                
## 2       155    oeff                
## 3       143    oeff                
## 4       145    oeff                
## 5       161    oeff                
## 6       369    priv         Waldorf
## 7       212    oeff                
## 8       183    oeff                
## 9       403    priv            Frei
## 10      142    oeff                
##                                             name                         ort
## 1                             Linden-Grundschule              Berlin Spandau
## 2                       Mary-Poppins-Grundschule              Berlin Spandau
## 3                    Astrid-Lindgren-Grundschule              Berlin Spandau
## 4                      Grundschule am Ritterfeld              Berlin Spandau
## 5                    Conrad-Schule (Grundschule)  Berlin Steglitz-Zehlendorf
## 6  Freie Waldorfschule Havelhöhe - Eugen Kolisko              Berlin Spandau
## 7                     Käthe-Kollwitz-Grundschule Berlin Tempelhof-Schöneberg
## 8                           Mercator-Grundschule  Berlin Steglitz-Zehlendorf
## 9                           Immanuel-Grundschule              Berlin Spandau
## 10                          Zeppelin-Grundschule              Berlin Spandau
##      plz                       geometry
## 1  13591 POLYGON ((372875.8 5823264,...
## 2  14089 POLYGON ((373855.6 5817213,...
## 3  13591 POLYGON ((373717.9 5820410,...
## 4  14089 POLYGON ((371755 5813750, 3...
## 5  14109 POLYGON ((371780.3 5810871,...
## 6  14089 POLYGON ((373736.3 5814453,...
## 7  12307 POLYGON ((389105.6 5805524,...
## 8  12207 POLYGON ((384245 5808272, 3...
## 9  13589 POLYGON ((374401.3 5825098,...
## 10 13591 POLYGON ((374299.7 5822374,...

Next we calculate the planning area intersections, their areas and weighted average of the welfare rate for each school’s Voronoi region using sapply. This computation takes some seconds to complete and in the end adds the weighted average of the welfare rate as welfare_chld variable to the schools’ Voronoi region dataset:

school_vor$welfare_chld <- sapply(school_vor$geometry, function(vor) {
  # the Voronoi polygon "vor" loses the CRS during sapply -> set it here again
  vor <- st_sfc(vor, crs = st_crs(bln))
  areas <- st_intersection(bln, vor)
  weighted.mean(areas$welfare_chld, as.numeric(st_area(areas)))
})

school_vor
## Simple feature collection with 435 features and 7 fields
## Geometry type: GEOMETRY
## Dimension:     XY
## Bounding box:  xmin: 370000.7 ymin: 5799520 xmax: 415785.3 ymax: 5837259
## Projected CRS: ETRS89 / UTM zone 33N
## First 10 features:
##    schoolid traeger priv_schule_typ
## 1       140    oeff                
## 2       155    oeff                
## 3       143    oeff                
## 4       145    oeff                
## 5       161    oeff                
## 6       369    priv         Waldorf
## 7       212    oeff                
## 8       183    oeff                
## 9       403    priv            Frei
## 10      142    oeff                
##                                             name                         ort
## 1                             Linden-Grundschule              Berlin Spandau
## 2                       Mary-Poppins-Grundschule              Berlin Spandau
## 3                    Astrid-Lindgren-Grundschule              Berlin Spandau
## 4                      Grundschule am Ritterfeld              Berlin Spandau
## 5                    Conrad-Schule (Grundschule)  Berlin Steglitz-Zehlendorf
## 6  Freie Waldorfschule Havelhöhe - Eugen Kolisko              Berlin Spandau
## 7                     Käthe-Kollwitz-Grundschule Berlin Tempelhof-Schöneberg
## 8                           Mercator-Grundschule  Berlin Steglitz-Zehlendorf
## 9                           Immanuel-Grundschule              Berlin Spandau
## 10                          Zeppelin-Grundschule              Berlin Spandau
##      plz                       geometry welfare_chld
## 1  13591 POLYGON ((372875.8 5823264,...    17.248205
## 2  14089 POLYGON ((373855.6 5817213,...     3.720224
## 3  13591 POLYGON ((373717.9 5820410,...    42.571484
## 4  14089 POLYGON ((371755 5813750, 3...     3.136074
## 5  14109 POLYGON ((371780.3 5810871,...     4.300000
## 6  14089 POLYGON ((373736.3 5814453,...     4.497095
## 7  12307 POLYGON ((389105.6 5805524,...    20.714686
## 8  12207 POLYGON ((384245 5808272, 3...    32.495663
## 9  13589 POLYGON ((374401.3 5825098,...    32.457281
## 10 13591 POLYGON ((374299.7 5822374,...     8.607028
plot(school_vor['welfare_chld'], main = 'Weighted average of percentage of children\nwhose parents receive social welfare per school Voronoi region')

We again compare public and private schools, this time with our revised calculations:

ggplot(school_vor) +
  geom_violin(aes(x = traeger, y = welfare_chld), draw_quantiles = c(0.5)) +
  geom_jitter(aes(x = traeger, y = welfare_chld), alpha = 0.25)

The median percentage of children whose parents obtain social welfare is still higher for public schools, but the difference is now four instead of six percent.

st_drop_geometry(school_vor) %>%
  group_by(traeger) %>%
  summarise(median_welfare_chld = median(welfare_chld))
## # A tibble: 2 x 2
##   traeger median_welfare_chld
##   <chr>                 <dbl>
## 1 oeff                   20.9
## 2 priv                   17.0

Limitations and conclusion